On the Worst-Case Complexity of the k-means Method

نویسندگان

David Arthur

Sergei Vassilvitskii

چکیده

The k-means method is an old but popular clustering algorithm known for its speed and simplicity. Until recently, however, no meaningful theoretical bounds were known on its running time. In this paper, we demonstrate that the worst-case running time of k-means is superpolynomial by improving the best known lower bound from Ω(n) iterations to 2 √ . To complement this lower bound, we show a smoothed-analysis type upper bound for k-means in a sufficiently large number of dimensions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Complexity of the k-means Method

The k-means method is a widely used technique for clustering points in Euclidean space. While it is extremely fast in practice, its worst-case running time is exponential in the number of data points. We prove that the k-means method can implicitly solve PSPACE-complete problems, providing a complexity-theoretic explanation for its worst-case running time. Our result parallels recent work on th...

متن کامل

Time and Space Complexity Reduction of a Cryptanalysis Algorithm

Binary Decision Diagram (in short BDD) is an efficient data structure which has been used widely in computer science and engineering. BDD-based attack in key stream cryptanalysis is one of the best forms of attack in its category. In this paper, we propose a new key stream attack which is based on ZDD(Zero-suppressed BDD). We show how a ZDD-based key stream attack is more efficient in time and ...

متن کامل

Time and Space Complexity Reduction of a Cryptanalysis Algorithm

متن کامل

K2-means for Fast and Accurate Large Scale Clustering

We propose k-means, a new clustering method which efficiently copes with large numbers of clusters and achieves low energy solutions. k-means builds upon the standard k-means (Lloyd’s algorithm) and combines a new strategy to accelerate the convergence with a new low time complexity divisive initialization. The accelerated convergence is achieved through only looking at kn nearest clusters and ...

متن کامل